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GENETIC POLYMORPHISMS WHICH ARE ASSOCIATED WITH 
AUTISM SPECTRUM DISORDERS 



The subject matter of this appHcation was made with support from the United 
States Government under Grants No. ROl AA08666, ROl NS 24287, R01HD34295, 
R01HD34969, and 2P30 ES01247 from the National Institutes of Health and Grant 
No. R824758 from the Environmental Protection Agency. The United States 
Government may retain certain rights. 

The present application claims the benefit of U.S. Provisional Patent 
Application Serial No. 60/049,803, filed June 17, 1997. 

FIELD OF THE INVENTION 

The present invention relates to a method of screening subjects for genetic 
markers associated with autism. The invention fiuther relates to isolated nucleic acids 
having polymorphisms associated with autism, the polypeptide products of those 
nucleic acids, and antibodies specific to the polypeptides produced by the mutated 
genes. 

BACKGROUND OF THE INVENTION 

Autism is a behaviorally defined syndrome characterized by impairment of 
social interaction, deficiency or abnormality of speech development, and limited 
activities and interest (American Psychiatric Association, 1994). The last category 
includes such abnormal behaviors as fascination with spinning objects, repetitive 
stereotypic movements, obsessive interests, and abnormal aversion to change in the 
environment. Symptoms are present by 30 months of age. The prevalence rate in 
recent Canadian studies using total ascertainment is over 1/1,000 (Bryson, S.E. et al., 
J. Child Psychol. Psvchiat. , 29, 433 (1988)). 

Attempts to identify the cause of the disease have been difficult, in part, 
because the symptoms do not suggest a brain region or system where injury would 
result in the diagnostic set of behaviors. Further, the nature of the behaviors included 
in the criteria preclude an animal model of the diagnostic symptoms and make it 
difficult to relate much of the experimental literature on brain injuries to the 
symptoms of autism. 



Several quantitative changes have been observed in autistic brains at autopsy. 
An elevation of about 100 g in brain weight has been reported (Bauman, M.L. and 
Kemper, T.L., Neurology 35, 866 (1985)). While attempts to find anatomical changes 
in the cerebral cortex have been unsuccessful (Williams, R.S. et al. Arch. Neurol , 37, 

5 749 (1980); Coleman P.D., et al., J. Autism Dev. Disord. , 15, 245 (1985)), several 
brains have been found to have elevated neuron packing density in structures of the 
limbic system (Bauman, M.L. and Kemper, T.L., Neurology 35, 866 (1985)), 
including the amygdala, hippocampus, septal nuclei and mammillary body. Multiple 
cases in multiple labs have been found to have abnormalities of the cerebellum. A 

10 deficiency of Purkinje cell and granule cell number, as well as reduced cell counts in 
the deep nuclei of the cerebellum and neuron shrinkage in the inferior olive, have 
been reported (Bauman, M.L. and Kemper, T.L., Neurology , 35, 866 (1985); Bauman, 
M.L. and Kemper, T.L.. Neurology . 36 (suppl. 1), 190 (1986); Bauman, M.L. and 
Kemper, T.L., The Neurobiology of Autism , Johns Hopkins University Press, 119 

15 (1994); Ritvo, E.R. et al.. Am. J. Psvchiat. . 143, 862 (1986); Kemper, T.L. and 

Bauman M.L., Neurobiology of Infantile Autism , Elsevier Science Publishers, 43 
(1992)). 

Imaging studies have allowed examination of some anatomical characteristics 
in living autistic patients, providing larger samples than those available for histologic 

20 evaluation. In general, these confirm that the size of the brain in autistic individuals is 
not reduced and that most regions are also normal in size (Piven, J. et al., Biol. 
Psvchiat. , 3 1 , 491 (1992)). Reports of size reductions in the brainstem have been 
inconsistent (Gaffney, G.R. et al, Biol. Psvchiat. . 24, 578 (1988); Hsu, M. et al., 
Arch. Neurol. 48, 1 160 (1991)), but a new, larger study suggests that the midbrain, 

25 pons, and medulla are smaller in autistic cases than in controls (Hashimoto, T. et al., 
J. Aut. Dev. Disord. , 25, 1 (1995)). In light of the histological effects reported for the 
cerebellum, it is interesting that the one region repeatedly identified as abnormal in 
imaging studies is the neocerebellar vermis (lobules VI and VII; Gaffney, G.R. et al.. 
Am. J. Pis. Child. . 141, 1330 (1987); Courchesne E., et al., N. Engl. J. Med. . 318, 

30 1349 (1988); Hashimoto, T. et al., J. Aut. Dev. Disord. , 25, 1 (1995)). Not all 
comparisons have found a difference in neocerebellar size (Piven, J. et al., Biol. 
Psvchiat. . 31, 491 (1992); Kleiman, M.D. et al., Neurology . 42, 753 (1992)), but a 
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recent reevaluation of positive and negative studies (Courchesne, E. et al. Neurology . 
44, 214 (1994)) indicates that a few autistic cases have hyperplasia of the 
neocerebellar vermis, while many have hypoplasia. Small samples of this 
heterogeneous population could explain disparate results regarding the size of the 
5 neocerebellum in autism. The proposal that the cerebellum in autistic cases can be 
either large or small is reasonable from an embryological standpoint, because injuries 
to the developing brain are sometimes followed by rebounds of neurogenesis (e.g., 
Andreoli, J. et al, Am. J. Anat. 137, 87 (1973); Bohn, M.C. and Lauder, J.M., Dev, 
Neurosci. . 1, 250 (1978); Bohn, M.C, Neuroscience , 5, 2003 (1980)), and it is 
10 possible that such rebounds could overshoot the normal cell number. Further, 

because increased cell density has been observed in the limbic system, the cerebellum 
is not the only brain region in which some form of overgrowth might account for the 
neuro-anatomy of autistic cases. It may well be that some autism-inducing injuries 
occur just prior to a period of rapid growth for the cerebellar lobules in question or the 
1 5 limbic system, leading to excess growth, while other injuries continue to be damaging 
during the period of rapid growth, leading to hypoplasia. However, the hypothesis 
that autism occurs with both hypoplastic and hyperplastic cerebella calls into question 
whether cerebellar anomalies play a major role in autistic symptoms. 

A particularly instructive result has appeared in an MRI study on the cerebral 
20 cortex (Piven, J. et al.. Am. J. Psvchiat. , 14, 734 (1992)). Of a small sample of 
autistic cases, the majority showed gyral anomahes (e.g., patches of pachygyria). 
However, the abnormal areas were not located in the same regions from case to case. 
That is, while the functional symptoms were similar in all the subjects, the brain 
damage observed was not. The investigators argue convincingly that the cortical 
25 anomalies were not responsible for the functional abnormalities. This is a central 
problem in all attempts to screen for pathology in living patients or in autopsy cases. 
While abnormalities may be present, it is not necessarily true that they are related to 
the symptoms of autism. 

To teratologists, the physical anomalies of a neonate, child, or aduh can serve 
30 as a guide to when the embryo was injured. Years of research have amplified the 
details of that timetable for the nervous system (Rodier, P.M., Dev. Med. Child 
Neurol. , 22, 525 (1980); Bayer, S.A. et al., Neurotoxicology , 14, 83 (1993)). In the 
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case of autism, lack of specific information on the neuroanatomy associated with the 
disease has made it difficult to estimate the stage of development when the disorder 
arises. However, in 1993, Miller and Stromland reported a finding that conclusively 
identified the time of origin for some cases. They observed that the rate of autism 
5 was 33% in people exposed to thalidomide between the 20th and 24th days of 

gestation, and 0% in cases exposed at other times (Stromland, K. et al., Devel. Med. 
Child. Neurol. , 36, 351 (1994)). Their deduction regarding the time of injury was not 
based on neuroanatomy, which was not known in their living subjects. Instead, it was 
based on the external stigmata of the cases. 
10 Because thousands of thalidomide-exposed offspring have been evaluated for 

somatic malformations, the array of injuries associated with the drug is well-known, 
and the time when each arises has been carefully defined (Miller, M.T., Trans. Am. 
Onhthalmol. Soc . 89, 623 (1991)). Of five cases of thalidomide-induced autism, four 
had malformations of the ears, without limb malformation, and the fifth had 
1 5 malformation of the ears, forelimb, and hindlimb. Thalidomide is not teratogenic 
before the 20th day of gestation. Starting on day 20 exposure causes ear 
malformation and abnormalities of the thumb. Limb malformations (other than those 
of the thumb) first appear with exposure on the 25th day, with effects moving from 
the forelimb to the hindlimb as exposure occurs at later stages. After the 35th day, 
20 thalidomide produces no malformations. Thus, the cases with malformations 

restricted to the ear must have been exposed before day 25, and the one patient with 
multiple malformations can only be explained as a case of repeated injuries at several 
stages of development. 

In fact, the idea that autism might arise very early in gestation was suggested 
25 long ago. Steg and Rapoport f j. Aut. Child. Schiz. , 5, 299 (1975)) noted the 

significant increase in minor physical anomalies among children with autism, and 
realized that they indicated an injury in the first trimester. Several studies of minor 
malformations have found ear effects to be the most common anomalies in autism 
(Walker, H.A., .T. Aut. Child. Schiz. . 7, 165 (1977); Campbell, M. et al., Am. J. 
30 Psvchiat. . 1 3 5, 573 (1 978)), and the most recent study shows that they are not only the 
best discriminator between people with autism and normal controls, but also the only 
anomaly that discriminates autism from other developmental disabilities (Rodier, 



P.M. et al.. Teratology 55, 319 (1997)). Ear anomalies are among the earliest of all 
minor physical malformations in their time of origin. 

External malformations are not the only evidence which puts the time of 
injury in autism at the time of neural tube closure. The cranial nerve dysfunctions 
5 observed in the patients with autism secondary to thalidomide exposure - facial nerve 
palsy, Duane syndrome (lack of abducens innervation with reinnervation of the lateral 
rectus by the oculomotor nerve), abnormal lacrimation, gaze paresis, and hearing 
deficits (Stromland, K. et al., Devel. Med. Child. Neurol. . 36, 351 (1994)) - suggest 
that the earliest-forming structures of the brain stem were damaged, and it is now 

10 known that these form during neural tube closure (Bayer, S.A. et al., 

Neur otoxicolo gy , 14, 83 (1993)). Subsequent studies have shown that a human brain 
from a patient with autism has the same pattern of brain stem injury predicted by the 
thalidomide cases (Rodier, P.M. et al, J. Comp. Neurol. . 370, 247 (1996)). Perhaps 
even more importantly, the autopsied brain has a shortening of the brain stem in the 

1 5 region of the fifth rhombomere, and is missing two of the nuclei known to form from 
that embryological structure. The rhombomeres exist so briefly (Streeter, G.L., Contr. 
Embryol. Cameg. Instn. , 30,213 (1948)) that the evidence that one failed to form is 
conclusive in pinpointing the time of injury. Like the thalidomide cases, the autopsy 
case could have been injured only at the time of neural tube closure. 

20 The effect of injury around neural tube closure has been tested experimentally, 

to see whether it can produce anatomical results like those suspected in the 
thalidomide cases and observed in human brain. Animals exposed during the critical 
period to valproic acid, a teratogen with effects similar to thalidomide, which has also 
been associated with autism (Christianson, A.L. et al., Devel. Med. Child. Neurol. , 36, 

25 357 (1994); Williams, P.O. et al.. Dev. Med. Child. Neurol. , 39, 632 (1997)) exhibit 
reductions in the number of cranial nerve motor neurons (Rodier, P.M. et al., J. Comp. 
Neurol . 370, 247 (1996)). They are distinguished from controls by shortening of the 
hindbrain in the region which forms from the fifth rhombomere, just as the autopsied 
brain was (Rodier, P.M., et al.. Teratology 55, 319 (1997)). Additional data suggests 

30 that the animal model has secondary changes in the cerebellum like those reported in 
some human cases of autism (Ingram, J.L. et al.. Teratology . 53, 86 (1996)). 
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It has long been known that heritable factors play an important role in the 
etiology of autism. This was demonstrated by the original twin studies of Folstein 
and Rutter ( J. Child Psychol. Psychiat. . 18, 297 (1977)) and the subsequent addition 
of more twin pairs to the sample has only increased the estimate of the proportion of 
5 cases suspected to have a genetic basis (e.g. Bailey, A. et al., Psychol. Med. , 25, 63 
(1995); LeCouteur, A. et al., J. Child Psvchol. Psychiat. . 37, 785 (1996)). Family 
studies of siblings (Smalley, S.L. et al, Arch. Gen. Psychiat. . 45, 953 (1988)) and 
parents (Landa, R. et al., J. Speech Hear. Res. . 34, 1339 (1991); Landa, R. et al.. 
Psych. Med. , 22, 245 (1992)) also support the conclusion that an inherited risk is 

1 0 inyolyed in many, perhaps all, cases of autism spectrum disorders. While the rate of 
autism is elevated in close relatives of cases, the rate of symptoms short of the 
diagnosis is increased much more. That is, individuals known to share genetic factors 
seem to vary in the degree to which symptoms are expressed. This non-Mendelian 
pattern (Jorde, L.B. et al., Am. J. Hum. Genet. . 49, 932 (1991)) suggests a complex 

1 5 disorder with major contributions from predisposing genetic factors, which interact 
with the overall genetic background and/or environmental insults to determine the 
phenotype. 

The ability to identify the genetic factors that increase the risk for autism 
would be a breakthrough for genetic counseling for prevention of the disorder. In 

20 addition, it would allow the creation of genetically-engineered animals in which to 

study the environmental factors that interact with the inherited predispositions. Tests 
for genetic factors would also serve as biomarkers, valuable for diagnosis, and useful 
in research on all aspects of the autism spectrum. Unfortimately, neither linkage nor 
association studies have revealed any chromosomal regions strongly related to autism 

25 (e.g. Spence, M.A. et al., Behav. Genet. . 15, 1 (1985); Smalley, S.L. et al., Arch. Gen. 
Psychiat. . 45, 953 (1988); Cook, E.H. et al., Molec. Psychiat. . 2, 247 (1997); Klauck, 
S.M. et al.. Hum. Molec. Genet. . 6, 2233 (1997); Cook, E.H. et al., Am. J. Hum. 
Genet. . 62, 1077(1998)). 

Furthermore, while there is no knoyvn medical treatment for autism, some 

30 success has been reported for early intervention with behavioral therapies. A 

biomarker would allow identification of the disease, now typically diagnosed between 
ages three and five, in infancy or prenatal life. Thus, there is an urgent need for a 
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method of reliably identifying subjects with autism. In particular there is need for a 
blood test for polymorphisms causing autism spectrum disorders. Families with 
affected members need to know whether they carry a mutation which could affect 
fature pregnancies. Clinicians need a test as an aid in diagnosis, and researchers 
5 would use the test to classify subjects according to the etiology of their disease. 

SUMMARY OF THE INVENTION 

The present invention relates to a method for screening subjects for genetic 
10 markers associated with autism. A biological sample is isolated from a mammal and 
then tested for the presence of a mutated gene or a product thereof which is associated 
with autism. 

Another aspect of the invention is an isolated nucleic acid encoding a HoxAl 
allele having a polymorphism which is associated with autism spectrum disorders. 
1 5 Yet another aspect of the invention is an isolated nucleic acid encoding a 

HoxBl allele having a polymorphism which is associated with autism spectrum 
disorders. 

BRIEF DESCRIPTION OF THE DRAWINGS 

20 

Figure 1 shows two different alleles of HoxAl from a case of autism spectrum 
disorder. Figure 1 A shows the previously published sequence of wild-type HoxAl. 
Figure IB shows a previously unknown polymorphism having a single base 
substitution at position 218, where an A is changed to a G. 

25 Figure 2 shows a second polymorphism was identified in the first exon of 

HoxBl. The published sequence of wild-type HoxBl (Figure 2 A) is compared to the 
previously unknown polymorphism in this paralog of HoxAl (Figure 2B). In this 
case, the anomaly is a nine-base insertion that adds a third repeat where two are 
normally present. The result is three extra amino acids, (serine-alanine-histidine). 

30 For each of the polymorphisms, it was possible to test for the presence of the allele 
different from the known sequence by digesting PGR product with a restriction 
enzyme (Hph-I for HoxAl and Msp-I for HoxBl). Sequencing reactions were carried 
out on 30-40 subjects to be certain that the digestion results match the sequencing 
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results, demonstrating that the digestion procediire detects the deviant sequence 
described and no other. 

DETAILED DESCRIPTION OF THE INVENTION 

5 

The present invention provides a method for screening subjects for genetic 
markers associated with autism. A biological sample is isolated from a mammal and 
then tested for the presence of a mutated gene or a product thereof which is associated 
with autism. 

10 Polymorphisms in Hox genes are shown to be associated with autism spectrum 

disorders. The Hox genes are a family of genes that function in the patterning of body 
structures that develop along an anteroposterior axis, such as the limbs, skeleton, and 
nervous system; they are expressed during embryonic development at specific times 
in limited regions of the embryo. In the mouse, for example, Hox-al is expressed in 

1 5 rhombomeres 4 through 8 of the developing hindbrain on days 8 to 8.5 of gestation. 
The Hox genes control the pattern formation of the hindbrain. Similar abnormalities 
have been observed in the brains of autistic individuals (Rodier et al., J. Comp. Neuro. 
370, 247 (1996), which is hereby incorporated by reference). 

The DNA and amino acid sequences for HoxA-1 have previously been 

20 reported (Acampora, D. et al.. Nucleic Acids Res. . 17, 10385 (1989); Hong, Y. et al. 
Gene, 159, 209 (1995) which are hereby incorporated by reference). Exon 1 stretches 
from base 1 to base 357. Exon 2 stretches from base 358 to the end (1008). The 
wildtype gene sequences for HoxAl is provided in SEQ. ID. No. 1 as follows: 

25 ATGGACAATG CAAGAATGAA CTCCTTCCTG GAATACCCCA TACTTAGCAG TGGCGACTCG 60 

GGGACCTGCT CAGCCCGAGC CTACCCCTCG GACCATAGGA TTACAACTTT CCAGTCGTGC 120 

GCGGTCAGCG CCAACAGTTG CGGCGGCGAC GACCGCTTCC TAGTGGGCAG GGGGGTGCAG 180 

ATCGGTTCGC CCCACCACCA CCACCACCAC CACCATCACC ACCCCCAGCC GGCTACCTAC 24 0 

CAGACTTCCG GGAACCTGGG GGTGTCCTAC TCCCACTCAA GTTGTGGTCC AAGCTATGGC 300 

30 TCACAGAACT TCAGTGCGCC TTACAGCCCC TACGCGTTAA ATCAGGAAGC AGACGTAAGT 360 

GGTGGGTACC CCCAGTGCGC TCCCGCTGTT TACTCTGGAA ATCTCTCATC TCCCATGGTC 420 

CAGCATCACC ACCACCACCA GGGTTATGCT GGGGGCGCGG TGGGCTCGCC TCAATACATT 480 



30 



CACCACTCAT ATGGACAGGA GCACCAGAGC CTGGCCCTGG CTACGTATAA TAACTCCTTG 540 

TCCCCTCTCC ACGCCAGCCA CCAAGAAGCC TGTCGCTCCC CCGCATCGGA GACATCTTCT 600 

CCAGCGCAGA CTTTTGACTG GATGAAAGTC AAAAGAAACC CTCCCAAAAC AGGGAAAGTT 660 

GGAGAGTACG GCTACCTGGG TCAACCCAAC GCGGTGCGCA CCAACTTCAC TACCAAGCAG 720 

CTCACGGAAC TGGAGAAGGA GTTCCACTTC AACAAGTACC TGACGCGCGC CCGCAGGGTG 7 80 

GAGATCGCTG CATCCCTGCA GCTCAACGAG ACCCAAGTGA AGATCTGGTT CCAGAACCGC 84 0 

CGAATGAAGC AAAAGAAACG TGAGAAGGAG GGTCTCTTGC CCATCTCTCC GGCCACCCCG 90 0 

CCAGGAAACG ACGAGAAGGC CGAGGAATCC TCAGAGAAGT CCAGCTCTTC GCCCTGCGTT 960 

CCTTCCCCGG GGTCTTCTAC CTCAGACACT CTGACTACCT CCCACTGA 1008 

The nucleic acid molecule of SEQ. ID. No. 1 encodes a polypeptide having 
the amino acid sequence of SEQ. ID. No. 2, as follows: 



HSSCGPSYGSQNFS 105 

PYSPYALNQEADVS 120 

GGYPQCAPAVYSGNL 135 

SSPMVQHHHHHQGYA 150 

35GGAVGSPQYIHHSYG 165 

QEHQSLALATYNNSL 180 

S PLHASHQEACRSPA 195 

40 

SETSSPAQTFDWMKV 210 

KRNPPKTGKVGEYGY 225 

45LGQPNAVRTNFTTKQ 240 

LTELEKEFHFNKYLT 255 



RARRVEIAASLQLNE 270 

TQVKIWFQNRRMKQK 285 

5KREKEGLLPISPATP 300 

PGNDEKAEESSEKSS 315 

SSPCVPSPGSSTSDT 330 

10 

L T T S H 335 

A polymorphism in the HoxAl gene has been isolated and sequenced. This 
polymorphism is associated with autism spectrum disorders. A single base 
1 5 substitution is located at position 218 (underlined) of SEQ. ID. No. 3, where an A is 
changed to a G, as follows: 

ATGGACAATG CAAGAATGAA CTCCTTCCTG GAATACCCCA TACTTAGCAG TGGCGACTCG 60 

GGGACCTGCT CAGCCCGAGC CTACCCCTCG GACCATAGGA TTACAACTTT CCAGTCGTGC 120 

20 GCGGTCAGCG CCAACAGTTG CGGCGGCGAC GACCGCTTCC TAGTGGGCAG GGGGGTGCAG 18 0 

ATCGGTTCGC CCCACCACCA CCACCACCAC CACCATCGCC ACCCCCAGCC GGCTACCTAC 24 0 

CAGACTTCCG GGAACCTGGG GGTGTCCTAC TCCCACTCAA GTTGTGGTCC AAGCTATGGC 300 

TCACAGAACT TCAGTGCGCC TTACAGCCCC TACGCGTTAA ATCAGGAAGC AGACGTAAGT 360 

GGTGGGTACC CCCAGTGCGC TCCCGCTGTT TACTCTGGAA ATCTCTCATC TCCCATGGTC 420 

25 CAGCATCACC ACCACCACCA GGGTTATGCT GGGGGCGCGG TGGGCTCGCC TCAATACATT 4 80 

CACCACTCAT ATGGACAGGA GCACCAGAGC CTGGCCCTGG CTACGTATAA TAACTCCTTG 54 0 

TCCCCTCTCC ACGCCAGCCA CCAAGAAGCC TGTCGCTCCC CCGCATCGGA GACATCTTCT 600 

CCAGGGCAGA CTTTTGACTG GATGAAAGTC AAAAGAAACC CTCCCAAAAC AGGGAAAGTT 6 60 

GGAGAGTACG GCTACCTGGG TCAACCCAAC GCGGTGCGCA CCAACTTCAC TACCAAGCAG 7 20 

30 CTCACGGAAC TGGAGAAGGA GTTCCACTTC AACAAGTACC TGACGCGCGC CCGCAGGGTG 7 80 

GAGATCGCTG CATCCCTGCA GCTCAACGAG ACCCAAGTGA AGATCTGGTT CCAGAACCGC 84 0 

CGAATGAAGC AAAAGAAACG TGAGAAGGAG GGTCTCTTGC CCATCTCTCC GGCCACCCCG 90 0 

CCAGGAAACG ACGAGAAGGC CGAGGAATCC TCAGAGAAGT CCAGCTCTTC GCCCTGCGTT 9 60 

CCTTCCCCGG GGTCTTCTAC CTCAGACACT CTGACTACCT CCCACTGA 10 08 
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The single base substitution at position 218 results in the replacement of 
histidine with arginine (underlined). The resulting protein has the amino acid 
sequence (SEQ. ID. No. 4) as follows: 



90 

105 

120 

135 

150 

165 

180 

195 

210 

225 

240 

255 

270 

285 

300 

315 

330 

335 



In addition to the polymorphism in HoxAl, a polymorphism associated with 
autism spectrum disorders has been isolated and sequenced from the HoxBl gene. 
The Hoxbl gene has not been studied as comprehensively as Hoxal in transgenic 



knockouts, but is expressed at the same stage (Murphy, P et al.. Development , 111,61 
(1991), which is hereby incorporated by reference). Its null mutation produces 
similar malformations, including severe diminution of the facial nucleus (Goddard, 
J.M. et al.. Development . 122, 3217 (1996), which is hereby incorporated by 
reference). The similarity of expression and function of these two genes is due to the 
fact that they were originally a single gene in invertebrates (Ruddle, F.H. et al., Annu. 
Rev. Genet. . 28, 423 (1993), which is hereby incorporated by reference). In 
mammals, the two appear on separate chromosomes (human 7 and 17), but the 
sequence of each of the mammalian genes is similar to the others, and similar to the 
original single gene from which the two mammalian loci arose. The sequence of the 
wildtype hoxBl gene (SEQ. ID. No. 5) follows: 

TGACGCATGG ACTATAATAG GATGAACTCC TTCTTAGAGT ACCCACTCTG TAACCGGGGA 60 
CCCAGCGCCT ACAGCGCCCA CAGCGCCCCA ACCTCCTTTC CCCCAAGCTC GGCTCAGGCG 12 0 

GTTGACAGCT ATGCAAGCGA GGGCCGCTAC GGTGGGGGGC TGTCCAGCCC TGCGTTTCAG 18 0 

CAGAACTCCG GCTATCCCGC CCAGCAGCCG CCTTCGACCC TGGGGGTGCC CTTCCCCAGC 24 0 

TCCGCGCCCT CGGGGTATGC TCCTGCCGCC TGCAGCCCCA GCTACGGGCC TTCTCAGTAC 300 
TACCCTCTGG GTCAATCAGA AGGAGACGGA GGCTATTTTC ATCCCTCGAG CTACGGGGCC 360 
CAGCTAGGGG GCTTGTCCGA TGGCTACGGA GCAGGTGGAG CCGGTCCGGG GCCATATCCT 4 20 

CCGCAGCATC CCCCTTATGG GAACGAGCAG ACCGCGAGCT TTGCACCGGC CTATGCTGAT 4 SO 

CTCCTCTCCG AGGACAAGGA AACACCCTGC CCTTCAGAAC CTAACACCCC CACGGCCCGG 54 0 

ACCTTCGACT GGATGAAGGT TAAGAGAAAC CCACCCAAGA CAGCGAAGGT GTCAGAGCCA 600 
GGCCTGGGCT CGCCCAGTGG CCTCCGCACC AACTTCACCA CAAGGCAGCT GACAGAACTG 660 
GAAAAGGAGT TCCATTTCAA CAAGTACCTG AGCCGGGCCC GGAGGGTGGA GATTGCCGCC 7 20 

ACCCTGGAGC TCAATGAAAC ACAGGTCAAG ATTTGGTTCC AGAACCGACG AATGAAGCAG 7 80 

AAGAAGCGCG AGCGAGAGGG AGGTCGGGTC CCCCCAGCCC CACCAGGCTG CCCCAAGGAG 840 
GCAGGTGGAG ATGCCTCAGA CCAGTCGACA TGCACCTCCC CGGAAGCCTC ACCCAGCTCT 900 
GTCACCTCCT GAACTGAACC TAGCCACCAA TGGGGCTTCC AGGCACTGGA GCGCCCCAGT 960 
CCAGCCCTAT CCCAGGCTCT CCCAACCCAG GCGTGGCTTC ACTGCCTGGG ATCTCTAGGC 102 0 

rp 1021 
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The protein encoded by nucleotides 7 to 909 of the wild-type HoxBl gene 
(SEQ. ID. No. 6) is as follows: 
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As with the HoxAl gene, polymorphisms associated with autism spectrum 
disorders were found with HoxBl. The HoxBl mutation occurs after base 88 (C) with 
50 the insertion of nine nucleotides (AC AGCGCCC). The location of this insertion is 
such that the amino acid sequence also changes. The normal sequence reads 
....serine-alanine-histidine-serine-alanine-proline. The mutant sequence has an extra 
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serine-alanine-histidine -sequence and then the sequence resumes normally. The 
insertion and altered amino acid sequence are underlined below. A mutated form of 
HoxBl (SEQ. ID. No. 7) is depicted as follows: 

TGACGCATGG ACTATAATAG GATGAACTCC TTCTTAGAGT ACCCACTCTG TAACCGGGGA 60 

CCCAGCGCCT ACAGCGCCCA CAGCGCCCAC AGCGCCC CAA CCTCCTTTCC CCCAAGCTCG 120 

GCTCAGGCGG TTGACAGCTA TGCAAGCGAG GGCCGCTACG GTGGGGGGCT GTCCAGCCCT 180 

GCGTTTCAGC AGAACTCCGG CTATCCCGCC CAGCAGCCGC CTTCGACCCT GGGGGTGCCC 2 40 

TTCCCCAGCT CCGCGCCCTC GGGGTATGCT CCTGCCGCCT GCAGCCCCAG CTACGGGCCT 300 

TCTCAGTACT ACCCTCTGGG TCAATCAGAA GGAGACGGAG GCTATTTTCA TCCCTCGAGC 360 

TACGGGGCCC AGCTAGGGGG CTTGTCCGAT GGCTACGGAG CAGGTGGAGC CGGTCCGGGG 4 20 

CCATATCCTC CGCAGCATCC CCCTTATGGG AACGAGCAGA CCGCGAGCTT TGCACCGGCC 4 80 

TATGCTGATC TCCTCTCCGA GGACAAGGAA ACACCCTGCC CTTCAGAACC TAACACCCCC 54 0 

ACGGCCCGGA CCTTCGACTG GATGAAGGTT AAGAGAAACC CACCCAAGAC AGCGAAGGTG 600 

TCAGAGCCAG GCCTGGGCTC GCCCAGTGGC CTCCGCACCA ACTTCACCAC AAGGCAGCTG 660 

ACAGAACTGG AAAAGGAGTT CCATTTCAAC AAGTACCTGA GCCGGGCCCG GAGGGTGGAG 720 

ATTGCCGCCA CCCTGGAGCT CAATGAAACA CAGGTCAAGA TTTGGTTCCA GAACCGACGA 78 0 

ATGAAGCAGA AGAAGCGCGA GCGAGAGGGA GGTCGGGTCC CCCCAGCCCC ACCAGGCTGC 84 0 

CCCAAGGAGG CAGCTGGAGA TGCCTCAGAC CAGTCGACAT GCACCTCCCC GGAAGCCTCA 900 

CCCAGCTCTG TCACCTCCTG AACTGAACCT AGCCACCAAT GGGGCTTCCA GGCACTGGAG 960 

CGCCCCAGTC CAGCCCTATC CCAGGCTCTC CCAACCCAGG CCTGGCTTCA CTGCCTGGGA 102 0 

TCTCTAGGCT 1030 

The protein encoded by SEQ. ID. No. 8 is as follows: 
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Genes which have been duplicated and then maintained similar functions over 
35 the course of evolution are called "paralogs." A third paralog derived from the same 
invertebrate gene is known as HoxDl. This gene has not yet been studied in 
knockouts, but is known to have evolved to be expressed in somewhat different 
embryonic tissues (mesoderm vs. ectoderm) in the hindbrain region at the same stage 
of development as Hoxal and Hoxbl. Thus preferred hox genes include HoxAl, 
40 HoxBU and HoxDl. 

Biological samples suitable for testing include blood, saliva, amniotic fluid, 
and tissue. The most preferred biological sample is blood. However, any biological 
sample from which genetic material or the products of the marker genes can be 
isolated is suitable. 

45 Because the Hox genes are highly conserved among species, the present 

invention is applicable for screening for autism related polymorphisms in mammals. 
The screening method can be utilized to identify animals carrying defects in genes 
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like those which give rise to autism in humans in order to study the progression of the 
disease and test treatments. However, the preferred mammal to be screened is 
humans. In particular, the biological samples are isolated from developmentally 
disabled children or adults in order to determine whether they carry the marker 
associated with autism to assist in diagnosing the disease. Similarly, the parents or 
relatives of disabled children may be screened to determine whether they are carriers 
of the mutated gene. Samples may also be tested from children including infants to 
identify those children who have genetic markers associated with autism in order to 
provide them with early behavior training. 

As discussed more fully in the examples, polymorphisms in the HoxAl gene 
are associated with autism spectrum disorders. In addition to HoxAl, the HoxBl and 
HoxDl genes are also involved in the same stages of early brain development. Hoxbl 
and Hoxdl are related developmental genes which are expressed at the same time and 
in approximately the same region of the embryo as Hoxal. The Hox genes are closely 
related and may perform similar functions in development. Evolutionarily the various 
Hox genes were probably derived from a common ancestral gene. Thus, the preferred 
genes to be screened include Hoxal, Hoxbl, and Hoxdl. 

The mutation in the mutated gene may be a single base substitution mutation 
resulting in an amino acid substitution, a single base substitution mutation resulting in 
a translational stop, an insertion mutation, a deletion mutation, or a gene 
rearrangement. As demonstrated from the identified polymorphisms in HoxAl and 
HoxBl, polymorphisms which disrupt the gene or result in an altered peptide are 
associated with autism spectrum disorders. 

The mutation may be located in an intron, an exon of the gene, or a promotor 
or other regulatory region which affects the expression of the gene. 

Methods for screening for mutated nucleic acids include direct sequencing of 
nucleic acids, single strand polymorphism assay, ligase chain reaction, enzymatic 
cleavage, and southern hybridization. 

Screening for mutated nucleic acids can be accomplished by direct sequencing 
of nucleic acids. In fact, putative mutants identified by other methods may be 
sequenced to determine the exact nature of the mutation. Nucleic acid sequences can 
be determined through a number of different techniques which are well known to 
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those skilled in the art. In order to sequence the nucleic acid, sufficient copies of the 
material must first be amplified. 

Amplification of a selected, or target, nucleic acid sequence may be carried 
out by any suitable means. (See generally Kwoh, D. and Kwoh, T., Am Biotechnol 
5 Lab, 8, 14 (1990), which is hereby incorporated by reference.) Examples of suitable 
amplification techniques include, but are not limited to, polymerase chain reaction, 
ligase chain reaction (see Barany, Proc Natl Acad Sci USA 88, 189 (1991), which is 
hereby incorporated by reference), strand displacement amplification (see generally 
Walker, G. et al.. Nucleic Acids Res. 20, 1691 (1992); Walker. G. et al., Proc Natl 

] 0 Acad Sci USA 89, 392 (1 992), which are hereby incorporated by reference), 

transcription-based amplification (see Kwoh, D. et al., Proc Natl Acad Sci USA . 86, 
1 173 (1989), which is hereby incorporated by reference), self-sustained sequence 
replication (or "3SR") (see Guatelh, J. et al., Proc Natl Acad Sci USA , 87, 1874 
(1990), which is hereby incorporated by reference), the QP replicase system (see 

1 5 Lizardi, P. et al.. Biotechnology , 6, 1 1 97 (1 988), which is hereby incorporated by 
reference), nucleic acid sequence-based amplification (or "NASBA") (see Lewis, R., 
Genetic Engineering News , 12(9), 1 (1992), which is hereby incorporated by 
reference), the repair chain reaction (or "RCR") (see Lewis, R., Genetic Engineering 
News , 12(9), 1 (1992), which is hereby incorporated by reference), and boomerang 

20 DNA amplification (or "BDA") (see Lewis, R, Genetic Engineering News , 12(9), 1 
(1992), which is hereby incorporated by reference). Polymerase chain reaction is 
currently preferred. 

In general, DNA amplification techniques such as the foregoing involve the 
use of a probe, a pair of probes, or two pairs of probes which specifically bind to 

25 DNA encoding the gene of interest, but do not bind to DNA which does not encode 
the gene, under the same hybridization conditions, and which serve as the primer or 
primers for the amplification of the gene of interest or a portion thereof in the 
amplification reaction. 

Nucleic acid sequencing can be performed by chemical or enzymatic methods. 

30 The enzymatic method relies on the ability of DNA polymerase to extend a primer, 
hybridized to the template to be sequenced, until a chain-terminating nucleotide is 
incorporated. The most common methods utilize didoexynucleotides. Primers may 
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be labelled with radioactive or fluorescent labels. Various DNA polymerases are 
available including Klenow fragment, AMV reverse transcriptase, Thermus aquaticus 
DNA polymerase, and modified T7 polymerase. 

Although DNA sequencing is clearly the most sensitive and informative 

5 method, it is too cumbersome for routine use in searching for polymorphisms, 
especially when the DNA segment of interest is large. Several other methods are 
available for a rapid search for changes in autism associated genes. 

Recently, single strand polymorphism assay ("SSPA") analysis and the closely 
related heteroduplex analysis methods have come into use as effective methods for 

1 0 screening for single-base polymorphisms (Orita, M. et al. , Proc Natl Acad Sci USA , 
86, 2766 (1989), which is hereby incorporated by reference). In these methods, the 
mobility of PCR-amplified test DNA from clinical specimens is compared with the 
mobility of DNA amplified from normal sources by direct electrophoresis of samples 
in adjacent lanes of native polyacrylamide or other types of matrix gels. Single-base 

1 5 changes often alter the secondary structure of the molecule sufficiently to cause slight 
mobility differences between the normal and mutant PGR products after prolonged 
electrophoresis. 

Ligase chain reaction is yet another recently developed method of screening 
for mutated nucleic acids. Ligase chain reaction (LCR) is also carried out in 

20 accordance with known techniques. LCR is especially usetul to amplify, and thereby 
detect, single nucleotide differences between two DNA samples. In general, the 
reaction is carried out with two pairs of oligonucleotide probes: one pair binds to one 
strand of the sequence to be detected; the other pair binds to the other strand of the 
sequence to be detected. The reaction is carried out by, first, denaturing (e.g., 

25 separating) the strands of the sequence to be detected, then reacting the strands with 
the two pairs of oligonucleotide probes in the presence of a heat stable ligase so that 
each pair of oligonucleotide probes hybridize to target DNA and, if there is perfect 
complementarity at their junction, adjacent probes are ligated together. The 
hybridized molecules are then separated under denaturation conditions. The process 

30 is cyclically repeated until the sequence has been amphfied to the desired degree. 

Detection may then be carried out in a manner like that described above with respect 
to PGR. 
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Southem hybridization is also an effective method of identifying differences 
in sequences. Hybridization conditions, such as salt concentration and temperature 
can be adjusted for the sequence to be screened. Southern blotting and hybridizations 
protocols are described in Current Protocols in Molecular Biology (Greene Publishing 
5 Associates and Wiley-Interscience), pages 2.9. 1 -2.9.1 0. Probes can be labelled for 
hybridization with random oligomers (primarily 9-mers) and the Klenow fragment of 
DNA polymerase. Very high specific activity probe can be obtained using 
commercially available kits such as the Ready-To-Go DNA Labelling Beads 
(Pharmacia Biotech), following the manufacturer's protocol. Briefly, 25 ng of DNA 

10 (probe) is labelled with ^^P-dCTP in a 15 minute incubation at 37°C. Labelled probe 
is then purified over a ChromaSpin (Clontech) nucleic acid purification column. 
Possible competition of probes having high repeat sequence content, and stringency 
of hybridization and washdown will be determined individually for each probe used. 
Alternatively, fragments of a candidate gene may be generated by PGR, the 

1 5 specificity may be verified using a rodent-human somatic cell hybrid panel, and 
subcloning the fragment. This allows for a large prep for sequencing and use as a 
probe. Once a given gene fragment has been characterized, small probe preps can be 
done by gel- or column-purifying the PGR product. 

These mismatch detection protocols use samples generated by PGR and thus 

20 require use of very little genomic template. All of these methods can provide very 
good clues regarding the location of the sequence change which leads to the 
appearance of anomalous bands, hence facilitating subsequent cloning and sequencing 
strategies. 

Methods of screening for mutated nucleic acids can be carried out using either 
25 deoxyribonucleic acids ("DNA") or messenger ribonucleic acids ("mRNA") isolated 
from the biological sample. During periods when the gene is expressed, mRNA may 
be abundant and more readily detected. However, these genes are temporally 
controlled and, at most stages of development, the preferred material for screening is 
DNA. 

30 Alternatively, the detection of a mutated gene associated with autism can be 

carried out by collecting a biological sample and testing for the presence or form of 
the protein produced by the gene. The mutation in the gene may result in the 
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production of a mutated form of the peptide or the lack of production of the gene 
product. In this embodiment, the determination of the presence of the polymorphic 
form of the protein can be carried out, for example, by isoelectric focusing, protein 
sizing, or immunoassay. In an immunoassay, an antibody that selectively binds to the 
5 mutated protein can be utilized (for example, an antibody that selectively binds to the 
mutated form of HoxAl encoded protein). Such methods for isoelectric focusing and 
immunoassay are well known in the art, and are discussed in further detail below. 

Changes in the size or charge of the polypeptide can be identified by 
isoelectric focusing or protein sizing techniques. Changes resulting in amino acid 

1 0 substitutions, where the substituted amino acid has a different charge than the original 
amino acid, can be detected by isoelectric focusing. Isoelectric focusing of the 
polypeptide through a gel having an ampholine gradient at high voltages separates 
proteins by their pi. The pH gradient gel can be compared to a simultaneously run gel 
containing the wild-type protein. Protein sizing techniques such as protein 

1 5 electrophoresis and sizing chromatography can also be used to detect changes in the 
size of the product. 

As an alternative to isoelectric focusing or protein sizing, the step of 
determining the presence of the mutated polypeptides in a sample may be carried out 
by an antibody assay with an antibody which selectively binds to the mutated 

20 polypeptides (i.e., an antibody which binds to the mutated polypeptides but exhibits 
essentially no binding to the wild-type polypeptide without the polymorphism in the 
same binding conditions). 

Antibodies used to bind selectively the products of the mutated genes can be 
produced by any suitable technique. For example, monoclonal antibodies may be 

25 produced in a hybridoma cell line according to the techniques of Kohler and Milstein, 
Nature , 265, 495 (1975), which is hereby incorporated by reference. A hybridoma is 
an immortalized cell line which is capable of secreting a specific monoclonal 
antibody. The mutated products of genes which are associated with autism may be 
obtained from a human patient, purified, and used as the immunogen for the 

30 production of monoclonal or polyclonal antibodies. Purified polypeptides may be 
produced by recombinant means to express a biologically active isoform, or even an 
immunogenic fragment thereof may be used as an immunogen. Monoclonal Fab 
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fragments may be produced in Escherichia coli from the known sequences by 
recombinant techniques known to those skilled in the art. {See, e.g. , Huse, W., 
Science 246, 1275 (1989), which is hereby incorporated by reference) (recombinant 
Fab techniques). 

5 The term "antibodies" as used herein refers to all types of immunoglobulin, 

including IgG, IgM, IgA, IgD, and IgE. The antibodies may be monoclonal or 
polyclonal and may be of any species of origin, including (for example) mouse, rat, 
rabbit, horse, or human, or may be chimeric antibodies, and include antibody 
fragments such as, for example. Fab, F(ab')2' and Fv fragments, and the 

10 corresponding fragments obtained from antibodies other than IgG. 

Antibody assays may, in general, be homogeneous assays or heterogeneous. 
In a homogeneous assay the immunological reaction usually involves the specific 
antibody, a labeled analyte, and the sample of interest. The signal arising from the 
label is modified, directly or indirectly, upon the binding of the antibody to the 

15 labeled analyte. Both the immunological reaction and detection of the extent thereof 
are carried out in a homogeneous solution. Immunochemical labels which may be 
employed include free radicals, radioisotopes, fluorescent dyes, enzymes, 
bacteriophages, coenzymes, and so forth. 

In a heterogeneous assay approach, the reagents are usually the specimen, the 

20 antibody of the invention and means for producing a detectable signal. Similar 

specimens as described above may be used. The antibody is generally immobilized 
on a support, such as a bead, plate, or slide, and contacted with the specimen 
suspected of containing the antigen in a liquid phase. The support is then separated 
from the liquid phase and either the support phase or the liquid phase is examined for 

25 a detectable signal employing means for producing such signal. The signal is related 
to the presence of the analyte in the specimen. Means for producing a detectable 
signal include the use of radioactive labels, fluorescent labels, enzyme labels, and so 
forth. For example, if the antigen to be detected contains a second binding site, an 
antibody which binds to that site can be conjugated to a detectable group and added to 

30 the liquid phase reaction solution before the separation step. The presence of the 

detectable group on the solid support indicates the presence of the antigen in the test 
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sample. Examples of suitable immunoassays are the radioimmunoassay, 
immunofluorescence methods, enzyme-linked immunoassays, and the like. 

Those skilled in the ait will be familiar with numerous specific immunoassay 
formats and variations thereof which may be useful for carrying out the method 
5 disclosed herein. See U.S. Patent No. 4,727,022, U.S. Patent No. 4,659,678, U.S. 

Patent No. 4,376,110, U.S. Patent No. 4,275,149, U.S. Patent No. 4,233,402, and U.S. 
Patent No. 4,230,767. 

Antibodies which selectively bind a polymorphic DLST isoform may be 
conjugated to a solid support suitable for a diagnostic assay (e.g., beads, plates, slides 

10 or wells formed from materials such as latex or polystyrene) in accordance with 
known techniques, such as precipitation. Antibodies which bind a polymorphic 
DLST isoform may likewise be conjugated to detectable groups such as radiolabels 
(e.g., ^^S, ^^^I, ^^^I), enzyme labels (e.g., horseradish peroxidase, alkaline 
phosphatase), and fluorescent labels (e.g., fluorescein) in accordance with knovm 

1 5 techniques. 

The invention ftirther provides an isolated nucleic acid molecule which 
encodes slHoxAI gene having a single base substitution at nucleotide 218 in SEQ. ID. 
No. 1. In another embodiment, the invention provides an isolated nucleic acid 
molecule which encodes a HoxBl gene having an insertion between positions 

20 nucleotides 88 and 89 in SEQ. ID. No. 5. In addition, the invention provides 
fragments of the HoxAl and HoxBl genes having the polymorphism, where the 
fragment has at least 15 nucleotides and encompasses the polymorphism, i.e., the 
single base substitution. Fragments longer than 15 nucleotides can be used to probe 
for nucleic acid molecules containing the polymorphism. Longer fragments may be 

25 used at higher stringency conditions. 

The invention also provides isolated polypeptides that are encoded by the 
genes having the polymorphisms. Either the whole protein or fragments thereof may 
be used to induce the production of antibodies specific to the portion of the protein 
which is effected by the polymorphism. Such antibodies may then be used to detect 

30 the presence of a polymorphism. Preferred antibodies bind specifically to the protein 
or polypeptide effected by the polymorphism but with less affinity to the wild-type 
Hox protein. 
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In one embodiment, the antibody is a monoclonal antibody. For use in an 
immunoassay, the antibody can be bound to a solid support or bound to a detectable 
label. 

EXAMPLES 

Example 1 - Collection of Blood Samples from Autistic Individuals 

Blood was collected from patients with autism and their immediate family 
members in order to determine whether any polymorphisms in HoxAl are present 
among this population. All blood samples were procured following written consent 
by the patients or their guardians. Among the samples collected were those of the 
members of a family of four in which one child has autism and the other has 
Asperger's syndrome; both children have malformed ears. The first son is retarded 
and the second has normal intelligence. The parents have no obvious symptoms. 
DNA was extracted from the blood by phenolchloroform extraction following 
isolation and lysis of the white blood cells. Control DNA was also used for these 
excrements; this DNA was obtained from neurologically normal donors. 

The 20 cc blood samples were left for three - four days at room temperature to 
allow continued proliferation of white blood cells. White cells were pelleted, 
followed by isolation of the nuclei. The nuclei were then incubated overnight at 37°C 
in a lysis buffer consisting of EDTA, TNE-SDS, and proteinase K. Protein 
contaminants were extracted by additions of buffered phenol followed by chloroform, 
then DNA was precipitated by the addition of ice-cold ethanol. The DNA was re- 
suspended in TE buffer for storage at 4°C. Extraction of genomic DNA from fixed 
tissue was carried out using the protocol of Volkenandt et al.. Methods in Molecular 
Biology . 15, 81, Humana Press, (1993), which is hereby incorporated by reference). 

Example 2 - Sequencing the Hoxal Gene 

The HoxAl gene was amplified by PGR from DNA samples to provide 
sufficient material for sequencing. Two sets of oligonucleotide primers were selected 
after examination of the human HoxAl nucleic acid sequence and comparison of the 
sequence to those of human and mouse Hox genes. The first set was designed to 
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amplify residues 10-647, the second to amplify from residue 656 to the stop codon at 
residue 1008, exons 1 and 2 oiHoxAl, respectively. The primers were used in 
polymerase chain reaction to amplify the target gene in several control blood samples, 
in order to determine the appropriate PGR conditions. Both exons were amplified by 
5 94°C denaturation for 1 min, 62°C annealing for 30 sec, and 72°C extension for 2 

min, for 35 cycles. The products were visualized with ethidium bromide staining on a 
1-2% agarose gel. PhiX174 RF DNA/Hae III fragments (Gibco) were used as a 
molecular weight marker. The products were tested for chromosome origin by using 
human-rodent monochromosomal somatic cell hybrids. Both exons amplified by the 

10 HoxAl primers amplified the hybrid containing human chromosome 7 and do not 
amplify from any other hybrids. Establishing that the product amplified by the 
primers is from the correct chromosome rules out the possibility that pseudogenes 
with the same sequence occur at other sites or that the amplified product is another 
homologous homeobox gene. It verifies that the PGR product represents only the 

1 5 targeted gene. 

The polymerase chain reaction (PGR) was performed with various samples of 
control DNA in order to determine the appropriate conditions. Once the optimal 
conditions were ascertained, the gene was amplified from the patient samples. 

FolloMdng PGR, an aliquot of the product was used for DNA sequencing using 

20 the Sequenase system version 2.0 (United States Biochemical), which is a chain- 
termination method of DNA sequencing. The following procedure was used to read 
the nucleic acid sequence of the amplified products. 7 jj.1 of PGR product was mixed 
with 2 jj,l shrimp alkaline phosphatase and 0.5 |j,l exonuclease I. The mixture was 
incubated at 37°C for 15 min and then at 80°C for 15 min. After addition of 1 |il of 

25 primer, the mixture was incubated at 100°G for 3 min and then chilled on ice for 5 
min. Next, the sample was incubated for 5 min at room temperature with the 
following additions: 2 |al 5x buffer, 1 \il DTT, 2 |^1 diluted dOTP, 0.5 ^l^^S-dATP, 
and 2 |ul diluted Sequenase buffer. A 3.5 jal aliquot of the mixture was then added to 
1 1J.1 of one dideoxyNTP. After 5 min at 37°G, 4 \x\ of stop solution was added to the 

30 tube. The products were run on a 6% polyacrylamide sequencing gel for 2-4 hr. 
Following this, the gel was dried on a BioRad gel dryer and exposed to film 
overnight. Film was developed on a Kodak M35A X-OMAT Processor. The method 
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has been used successfully to duplicate the published sequence of the Hoxal exons in 
samples from a number of controls. The film was developed the next afternoon, and 
the DNA sequence was read manually for comparison to the published Hox Al 
sequence. 

The nucleotide sequence from some patients, including the members of the 
family mentioned previously, showed the presence of two discrete bands at the same 
levels on the gel. 

Example 3 - Sequencing the PGR Products 

Since sequencing PGR products allows the DNA sequence to be read from 
both alleles, a sequence with double bands suggests heterozygosity -- that the two 
alleles are not the same and that two different sequences superimposed on one another 
are being read. Based on these results, the PGR products were cloned in order to get a 
cleaner sequence. Cloning separates the two alleles and allowed each to be 
individually sequenced to determine whether one or both alleles are abnormal. 

The PGR products were cloned using Invitrogen's Zero Blimt PGR Gloning 
Kit. This kit is designed to clone blunt-ended PGR fragments, which can be 
generated by using a thermostable DNA polymerase with proofreading activity. Once 
the products were cloned, the clonal DNA was sequenced using the Sequenase 
version 2.0 chain-termination sequencing system. Each clone was sequenced in both 
5' and 3' directions, and the reactions were run out for 6 hours on a 6% 
polyacrylamide sequencing gel. 

Cloning allowed the determination that three out of four members of this 
family are indeed heterozygous for Hox Al. The father and both children contain an 
identical mutation in the gene: a single base-pair change of A to G in the first exon of 
the gene; the mother's gene is normal. This mutation is dominant with variable 
penetrance. Sequences showing the mutation can be seen in Figure 1 . Figure 1 A 
shows the wild-type sequence. Substitution of guanine for adenine at this single 
location as shown in Figure IB causes an alteration in the resulting amino acid 
sequence, changing an arginine to a histidine. 
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Example 4 - Restriction Analysis of PGR Products 

The PGR products from this family were also subjected to restriction enzyme 
digestion to confirm the mutation. The enzyme Hph / recognizes the specific 

5 sequence 3 ' . . .GGACT(N7). . .5 ' . When normal HoxA 1 is digested with this enzyme, it 
will be cut; however, when mutated HoxAl is digested, it will not be cut, because the 
recognition site has been changed by the mutation. This enzyme has been used to 
digest PGR products from this family and confirm that the mutation does indeed exist 
in the father and the children but not in the mother. This enzyme has been used to 

10 digest PGR products from approximately 100 controls, 36 parent pairs, 26 affected 
relatives, and 46 probands. In forty cases, the results of the restriction analysis has 
been compared to that from the sequencing reactions. The two methods gave 
identical results in every case. 

1 5 Example 5 - Sequencing of a Polymorphism in HoxBI 

The sequence for the HoxBl gene (accession number XI 6666) was obtained 
from the Entrez data base. From this sequence primers for the amplification of a 
575bp product of exon 1 by PGR were designed (Sense: 5'- 

20 GCATGGACTATAATAGGATG-3' (SEQ. ID. No. 9); Antisense: 5'- 

TCTTGGGTGGGTTTCTGTTA-3' (SEQ. ID. No. 10)). The final concentration of 
the following components were used in the amplification reaction: 1 .5U Taq 
polymerase; 200 ^iM each of dATP, dCTP, dGTP, dTTP; 1.5 mM MgCl; 0.4 mM of 
each sense and antisense primer; 50-100 ng DNA template; and distilled H2O to a final 

25 volume of 25 [xl. The Taq, dNTPs and MgCl are supplied in a Ready-To-Go PGR 
Bead (Pharmacia 27-9555-01) and were used according to manufacturer's directions. 
The PGR reaction was carried out in a Perkin-Elmer 480 GeneAmp or a Perkin-Elmer 
2400 thermocycler. Reaction conditions were: denaturing for 1 minute at 94 G, and 
then 35 cycles of denaturing at 94 G for 45 sec, annealing at 57 C for 45 sec, and 

30 elongation at 72 C for 45 sec. Resulting PGR product was analyzed on a 1% agarose 
gel and compare to a 100 bp ladder to determine the size of the product. Since the 
size of the product was as expected (575 bp) and somatic cell hybrid results indicated 
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that the product is specific for chromosome 17 DNA samples from probands, family 
members and controls were amplified and sequenced using a radiolabeled terminator 
cycle sequencing kit (Amersham Life Science US79750). The sequencing reaction 
was ran on a 6% acrylamide sequencing gel (National Diagnostics) and exposed to 
Kodak Biomax MS X ray film for 24-48 hours. After developing the film, the 
resulting sequence was compared to the published sequence found in the Entrez data 
base. 

Example 6 - Association of the newly-discovered alleles with autism spectrum 
disorders. 

Forty-six probands with autistic spectrum disorders and evidence of genetic 
causation were selected for analysis. Forty-three had one or more other affected 
family members and thirty-five had ear anomalies or neurological deficits consistent 
with malfunction ofHoxAl or its paralogs. For comparison, three other groups were 
tested: 

1) An unstructured control group consisting of adults with no evidence of 
neurological abnormality collected from many different medical centers. These were 
mostly spouses of patients with late onset degenerative diseases of the nervous 
system. The purpose of this group was to determine the frequency of the alleles in the 
general population. 

2) Parent controls ~ While each of the parents of a proband obviously 
transmits half of his or her genetic material to the proband, imaginary individuals with 
two alleles constructed from the untransmitted allele of each parent pair should give 
an accurate estimate of the frequency of the alleles in the study population, aside from 
those transmitted to the probands. Thus, the untransmitted alleles of the parent pair 
make a more stringent control, taking into account known and unknown structure in 
the local population. 

3) Affected family members of probands ~ When they were available, the 
siblings, cousins, parents, or aunts and uncles of probands diagnosed with autism 
spectrum disorders or related symptoms (e.g. learning disabilities, language delays, 
neurological anomalies of the cranial nerves) were tested. If an allele is associated 
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with autism, it should be more frequent in probands and affected family members 
than in historic or parent controls. 



5 Table 1. Percent of individuals with polymorphic forms of HoxAl and/or Bl 

HOXAl HOXBl HOXAl or HOXBl 

Historic controls (N=101) 16 34 47 

Parent controls (N=36) 22 39 55 ' 

Probands with ASD (N=46) 35** 52* 80*** 

Other affected relatives (N=24) 38* 42 75* 



different from historical controls: * = p<.05, ** = p<.01, *** = p<.001 
different from probands: = <.05 



10 Table 1 demonstrates that parent controls are, indeed, similar to historic controls in 
their rates of the polymorphisms under study, while affected family members are 
similar to probands. This is especially true when the two functionally-related genes 
are combined. Eighty percent of probands have one deviation from the normal 
sequence or the other, while only 47% of historical controls have an anomaly. Parent 

1 5 controls (untranslated alleles) match the historical controls in their rate of abnormal 
alleles, indicating that the local population is not structured differently from the 
general population in its rate of these alleles. In contrast, both probands (x^ = 14.83, 
p<.001) and other affected family members (x^= 6.30, p<.02) differ significantly from 
historical controls. The probands differ significantly fi-om the parent controls, as well 

20 (x^ = 4.08, p<.05). The probands with genetic anomalies of HoxAl or HoxBl are 

concordant with the other affected members of the family in 18/22 cases (x^ = 17.82, 
p<.001). Finally, both the HoxAl and HoxBl polymorphisms are significantly 
associated with autism as judged by the Transmission Disequilibrium Test for 
Association (Spielman and Ewens, 1996), which compares the rate of transmission 
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"into the disease" to the 50% rate one would expect in offspring of parents with the 
allele of interest. The ^ s for this test are: HoxAl= 5.16, p<.05; HoxBl = 4.67, 
p<.05. 

In addition to the Hving probands, it was of interest to determine the genotype 
of the patient whose brain anatomy first suggested the involvement of the Hox genes 
in autism (Rodier et al, 1996). Genomic DNA was extracted from the autopsy tissue, 
and the patient was determined to have the Bl polymorphism (Stodgell et al., 1998). 

One proband is homozygous for the less common allele of HoxAl, and he is 
severely affected. He was diagnosed early, at 21 months. None of the historic 
controls, and no parents, were homozygous for the polymorphism. Homozygosity of 
theHoxBl polymorphism occurred in two historic controls, one affected parent, and in 
two severely-affected probands. Larger samples are needed to determine whether 
either polymorphism reduces viability. Three probands have both polymorphisms, 
and are severely disabled. The detection and description of the polymorphisms in the 
first exons of HoxAl and HoxBl and the progress of the association studies have been 
described in a book chapter and two abstracts (Rodier, 1998; Ingram et al, 1997; 
Stodgell et al., 1998). 

Example 7 - Identification of a Second Polymorphism in HoxAl 

A third polymophism has been detected in the homeobox region of HoxAl in 
the second exon. The second exon cannot be amplified by PGR from the DNA of 
four probands indicating that an anomaly exists. This indicates that they are 
homozygous for a deviation from the published sequence on which the primers for the 
exon were based. PGR amplification yields suggest that about ten other probands are 
heterozygotes for this polymorphism of the second exon of HoxAl. 

Additional primers have been developed that will allow complete sequencing 
of the altered region, which appears to be at the 3' end of the homeobox. Once the 
sequence is established, a test (such as the use of restriction length polymorphisms) 
can be developed to allow rapid evaluation of DNA samples. The degree of 
association of this polymorphism with autism spectrum disorders will then be studied 
in the same groups already evaluated for the others. Other studies in progress are 
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designed to examine the second exon of HoxBl and the non-coding regions of both 
genes. 

Example 8 - Identification of Additional Polymorphisms in HoxBl and HoxDl 
Associated with Autism 

The procedures for evaluating the candidate gene HoxDl, as well as for 
finding additional polymorphisms in HoxAl and HoxBl, will be the same as for those 
already identified in HoxAl and HoxBl. Mutation detection in the coding sequence 
of these genes will consist of PGR amplification, cloning and sequencing. Mutation 
detection for the entire genes will include large deletion/insertion analysis by 
Southern blotting, analysis of 200-400 bp fragments by SSCP or heteroduplex 
analysis, and of course cloning and sequencing when heterozygosity becomes 
apparent for any region of the genes. Current Protocols in H uman Genetics (John 
Wiley & Sons, Inc.), Chapter 7, "Searching Candidate Genes for Mutations." 

Biological samples already isolated from patients with autism which did not 
show any abnormalities in HoxAl or HoxBl will be screened for polymorphisms in 
HoxDl. 

Although preferred embodiments have been depicted and described in detail 
herein, it will be apparent to those skilled in the relevant art that various 
modifications, additions, substitutions, and the like can be made without departing 
from the spirit of the invention and these therefore are considered within the scope of 
the invention as defined in the claims which follow. 
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SEQUENCE LISTING 



(1) GENERAL INFORMATION: 

(i) APPLICANT: Rodier, Patricia M. 

Ingram, Jennifer L. 
Figlewicz, Denise A. 
Hyman, Susan L. 
Stodgell, Christopher J. 

(ii) TITLE OF INVENTION: GENETIC POLYMORPHISMS WHICH ARE 

ASSOCIATED WITH AUTISM SPECTRUM DISORDERS 

(iii) NUMBER OF SEQUENCES: 10 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: Nixon, Hargrave, Devans & Doyle LLP 

(B) STREET: Clinton Square, P.O. Box 1051 

(C) CITY: Rochester 

(D) STATE: New York 

(E) COUNTRY: U.S.A. 

(F) ZIP: 14603-1051 

(V) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS /MS-DOS 

(D) SOFTWARE: Patentin Release #1.0, Version #1.30 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 

(C) CLASSIFICATION: 

(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: US 60/049,803 

(B) FILING DATE: 17-JUN-1997 

(viii) ATTORNEY /AGENT INFORMATION: 

(A) NAME: Goldman, Michael L. 

(B) REGISTRATION NUMBER: 30,727 

(C) REFERENCE/ DOCKET NUMBER: 176/60181 

(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: (716) 263-1304 

(B) TELEFAX: (716) 263-1600 



(2) INFORMATION FOR SEQ ID N0:1: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1008 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ix) MOLECULE TYPE: cDNA 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 1 : 

ATGGACAATG CAAGAATGAA CTCCTTCCTG GAATACCCCA TACTTAGCAG TGGCGACTCG 60 

GGGACCTGCT CAGCCCGAGC CTACCCCTCG GACCATAGGA TTACAACTTT CCAGTCGTGC 120 

GCGGTCAGCG CCAACAGTTG CGGCGGCGAC GACCGCTTCC TAGTGGGCAG GGGGGTGCAG 18 0 

ATCGGTTCGC CCCACCACCA CCACCACCAC CACCATCACC ACCCCCAGCC GGCTACCTAC 24 0 

CAGACTTCCG GGAACCTGGG GGTGTCCTAC TCCCACTCAA GTTGTGGTCC AAGCTATGGC 30 0 

TCACAGAACT TCAGTGCGCC TTACAGCCCC TACGCGTTAA ATCAGGAAGC AGACGTAAGT 3 60 

GGTGGGTACC CCCAGTGCGC TCCCGCTGTT TACTCTGGAA ATCTCTCATC TCCCATGGTC 420 

CAGCATCACC ACCACCACCA GGGTTATGCT GGGGGCGCGG TGGGCTCGCC TCAATACATT 4 80 

CACCACTCAT ATGGACAGGA GCACCAGAGC CTGGCCCTGG CTACGTATAA TAACTCCTTG 54 0 

TCCCCTCTCC ACGCCAGCCA CCAAGAAGCC TGTCGCTCCC CCGCATCGGA GACATCTTCT 600 

CCAGCGCAGA CTTTTGACTG GATGAAAGTC AAAAGAAACC CTCCCAAAAC AGGGAAAGTT 6 60 

GGAGAGTACG GCTACCTGGG TCAACCCAAC GCGGTGCGCA CCAACTTCAC TACCAAGCAG 7 20 

CTCACGGAAC TGGAGAAGGA GTTCCACTTC AACAAGTACC TGACGCGCGC CCGCAGGGTG 78 0 

GAGATCGCTG CATCCCTGCA GCTCAACGAG ACCCAAGTGA AGATCTGGTT CCAGAACCGC 8 40 

CGAATGAAGC AAAAGAAACG TGAGAAGGAG GGTCTCTTGC CCATCTCTCC GGCCACCCCG 900 

CCAGGAAACG ACGAGAAGGC CGAGGAATCC TCAGAGAAGT CCAGCTCTTC GCCCTGCGTT 9 60 

CCTTCCCCGG GGTCTTCTAC CTCAGACACT CTGACTACCT CCCACTGA 1008 

(2) INFORMATION FOR SEQ ID NO : 2 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 335 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 2 : 

Met Asp Asn Ala Arg Met Asn Ser Phe Leu Glu Tyr Pro lie Leu Ser 
15 10 15 

Ser Gly Asp Ser Gly Thr Cys Ser Ala Arg Ala Tyr Pro Ser Asp His 
20 25 30 

Arg lie Thr Thr Phe Gin Ser Cys Ala Val Ser Ala Asn Ser Cys Gly 
35 40 45 
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Gly Asp Asp Arg Phe Leu Val Gly Arg Gly Val Gin He Gly Ser Pro 
50 55 60 

His His His His His His His His His His Pro Gin Pro Ala Thr Tyr 
65 70 75 80 

Gin Thr Ser Gly Asn Leu Gly Val Ser Tyr Ser His Ser Ser Cys Gly 
85 90 95 

Pro Ser Tyr Gly Ser Gin Asn Phe Ser Ala Pro Tyr Ser Pro Tyr Ala 
100 105 110 

Leu Asn Gin Glu Ala Asp Val Ser Gly Gly Tyr Pro Gin Cys Ala Pro 
115 120 125 

Ala Val Tyr Ser Gly Asn Leu Ser Ser Pro Met Val Gin His His His 
130 135 140 

His His Gin Gly Tyr Ala Gly Gly Ala Val Gly Ser Pro Gin Tyr He 
145 150 155 160 

His His Ser Tyr Gly Gin Glu His Gin Ser Leu Ala Leu Ala Thr Tyr 
165 170 175 

Asn Asn Ser Leu Ser Pro Leu His Ala Ser His Gin Glu Ala Cys Arg 
180 185 190 

Ser Pro Ala Ser Glu Thr Ser Ser Pro Ala Gin Thr Phe Asp Trp Met 
195 200 205 

Lys Val Lys Arg Asn Pro Pro Lys Thr Gly Lys Val Gly Glu Tyr Gly 
210 215 220 

Tyr Leu Gly Gin Pro Asn Ala Val Arg Thr Asn Phe Thr Thr Lys Gin 
225 230 235 240 

Leu Thr Glu Leu Glu Lys Glu Phe His Phe Asn Lys Tyr Leu Thr Arg 
245 250 255 

Ala Arq Arg Val Glu He Ala Ala Ser Leu Gin Leu Asn Glu Thr Gin 
260 265 270 

Val Lys He Trp Phe Gin Asn Arg Arg Met Lys Gin Lys Lys Arg Glu 
275 280 285 

Lys Glu Gly Leu Leu Pro He Ser Pro Ala Thr Pro Pro Gly Asn Asp 
290 295 300 

Glu Lys Ala Glu Glu Ser Ser Glu Lys Ser Ser Ser Ser Pro Cys Val 
305 310 315 320 

Pro Ser Pro Gly Ser Ser Thr Ser Asp Thr Leu Thr Thr Ser His 
325 330 335 
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(2) INFORMATION FOR SEQ ID NO : 3 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1008 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 3 : 



ATGGACAATG 


CAAGAATGAA 


CTCCTTCCTG 


GAATACCCCA 


TACTTAGCAG 


TGGCGACTCG 


60 


GGGACCTGCT 


CAGCCCGAGC 


CTACCCCTCG 


GACCATAGGA 


TTACAACTTT 


CCAGTCGTGC 


120 


GCGGTCAGCG 


CCAACAGTTG 


CGGCGGCGAC 


GACCGCTTCC 


TAGTGGGCAG 


GGGGGTGCAG 


180 


ATCGGTTCGC 


CCCACCACCA 


CCACCACCAC 


CACCATCGCC 


ACCCCCAGCC 


GGCTACCTAC 


240 


CAGACTTCCG 


GGAACCTGGG 


GGTGTCCTAC 


TCCCACTCAA 


GTTGTGGTCC 


AAGCTATGGC 


300 


TCACAGAACT 


TCAGTGCGCC 


TTACAGCCCC 


TACGCGTTAA 


ATCAGGAAGC 


AGACGTAAGT 


360 


GGTGGGTACC 


CCCAGTGCGC 


TCCCGCTGTT 


TACTCTGGAA 


ATCTCTCATC 


TCCCATGGTC 


420 


CAGCATCACC 


ACCACCACCA 


GGGTTATGCT 


GGGGGCGCGG 


TGGGCTCGCC 


TCAATACATT 


480 


CACCACTCAT 


ATGGACAGGA 


GCACCAGAGC 


CTGGCCCTGG 


CTACGTATAA 


TAACTCCTTG 


540 


TCCCCTCTCC 


ACGCCAGCCA 


CCAAGAAGCC 


TGTCGCTCCC 


CCGCATCGGA 


GACATCTTCT 


600 


CCAGCGCAGA 


CTTTTGACTG 


GATGAAAGTC 


AAAAGAAACG 


CTCCCAAAAC 


AGGGAAAGTT 


660 


GGAGAGTACG 


GCTACCTGGG 


TCAACCCAAC 


GCGGTGCGCA 


CCAACTTCAC 


TACCAAGCAG 


720 


CTCACGGAAC 


TGGAGAAGGA 


GTTCCACTTC 


AACAAGTACC 


TGACGCGCGC 


CCGCAGGGTG 


780 


GAGATCGCTG 


CATCCCTGCA 


GCTCAACGAG 


ACCCAAGTGA 


AGATCTGGTT 


CCAGAACCGC 


840 


CGAATGAAGC 


AAAAGAAACG 


TGAGAAGGAG 


GGTCTCTTGC 


CCATCTCTCC 


GGCCACCCCG 


900 


CCAGGAAACG 


ACGAGAAGGC 


CGAGGAATCC 


TCAGAGAAGT 


CCAGCTCTTC 


GCCCTGCGTT 


960 


CCTTCCCCGG 


GGTCTTCTAC 


CTCAGACACT 


CTGACTACCT 


CCCACTGA 




1008 



(2) INFORMATION FOR SEQ ID NO : 4 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 335 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 



(11) MOLECULE TYPE: protein 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 4 : 

Met Asp Asn Ala Arg Met Asn Ser Phe Leu Glu Tyr Pro lie Leu Ser 
15 10 15 

Ser Gly Asp Ser Gly Thr Cys Ser Ala Arg Ala Tyr Pro Ser Asp His 
20 25 30 

Arg lie Thr Thr Phe Gin Ser Cys Ala Val Ser Ala Asn Ser Cys Gly 
35 40 45 

Gly Asp Asp Arg Phe Leu Val Gly Arg Gly Val Gin He Gly Ser Pro 
50 55 60 

His His His His His His His His Arg His Pro Gin Pro Ala Thr Tyr 
65 70 75 80 

Gin Thr Ser Gly Asn Leu Gly Val Ser Tyr Ser His Ser Ser Cys Gly 
85 90 95 

Pro Ser Tyr Gly Ser Gin Asn Phe Ser Ala Pro Tyr Ser Pro Tyr Ala 
100 105 110 

Leu Asn Gin Glu Ala Asp Val Ser Gly Gly Tyr Pro Gin Cys Ala Pro 
115 120 125 

Ala Val Tyr Ser Gly Asn Leu Ser Ser Pro Met Val Gin His His His 
130 135 140 

His His Gin Gly Tyr Ala Gly Gly Ala Val Gly Ser Pro Gin Tyr He 
145 150 155 160 

His His Ser Tyr Gly Gin Glu His Gin Ser Leu Ala Leu Ala Thr Tyr 
165 170 175 

Asn Asn Ser Leu Ser Pro Leu His Ala Ser His Gin Glu Ala Cys Arg 
180 185 190 

Ser Pro Ala Ser Glu Thr Ser Ser Pro Ala Gin Thr Phe Asp Trp Met 
195 200 205 

Lys Val Lys Arg Asn Pro Pro Lys Thr Gly Lys Val Gly Glu Tyr Gly 
210 215 220 

Tyr Leu Gly Gin Pro Asn Ala Val Arg Thr Asn Phe Thr Thr Lys Gin 
225 230 235 240 

Leu Thr Glu Leu Glu Lys Glu Phe His Phe Asn Lys Tyr Leu Thr Arg 
245 250 255 

Ala Arg Arg Val Glu He Ala Ala Ser Leu Gin Leu Asn Glu Thr Gin 
260 265 270 

Val Lys He Trp Phe Gin Asn Arg Arg Met Lys Gin Lys Lys Arg Glu 
275 280 285 

Lys Glu Gly Leu Leu Pro He Ser Pro Ala Thr Pro Pro Gly Asn Asp 
290 295 300 
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Glu Lys Ala Glu Glu Ser Ser Glu Lys Ser Ser Ser Ser Pro Cys Val 
305 310 315 320 

Pro Ser Pro Gly Ser Ser Thr Ser Asp Thr Leu Thr Thr Ser His 



(2) INFORMATION FOR SEQ ID NO : 5 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1021 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 5 : 
TGACGCATGG ACTATAATAG GATGAACTCC TTCTTAGAGT ACCCACTCTG TAACCGGGGA 60 

CCCAGCGCCT ACAGCGCCCA CAGCGCCCCA ACCTCCTTTC CCCCAAGCTC GGCTCAGGCG 120 

GTTGACAGCT ATGCAAGCGA GGGCCGCTAC GGTGGGGGGC TGTCCAGCCC TGCGTTTCAG 18 0 

CAGAACTCCG GCTATCCCGC CCAGCAGCCG CCTTCGACCC TGGGGGTGCC CTTCCCCAGC 24 0 

TCCGCGCCCT CGGGGTATGC TCCTGCCGCC TGCAGCCCCA GCTACGGGCC TTCTCAGTAC 300 

TACCCTCTGG GTCAATCAGA AGGAGACGGA GGCTATTTTC ATCCCTCGAG CTACGGGGCC 360 

CAGCTAGGGG GCTTGTCCGA TGGCTACGGA GCAGGTGGAG CCGGTCCGGG GCCATATCCT 4 20 

CCGCAGCATC CCCCTTATGG GAACGAGCAG ACCGCGAGCT TTGCACCGGC CTATGCTGAT 48 0 

CTCCTCTCCG AGGACAAGGA AACACCCTGC CCTTCAGAAC CTAACACCCC CACGGCCCGG 54 0 

ACCTTCGACT GGATGAAGGT TAAGAGAAAC CCACCCAAGA CAGCGAAGGT GTCAGAGCCA 600 

GGCCTGGGCT CGCCCAGTGG CCTCCGCACC AACTTCACCA CAAGGCAGCT GACAGAACTG 6 60 

GAAAAGGAGT TCCATTTCAA CAAGTACCTG AGCCGGGCCC GGAGGGTGGA GATTGCCGCC 7 20 

ACCCTGGAGC TCAATGAAAC ACAGGTCAAG ATTTGGTTCC AGAACCGACG AATGAAGCAG 7 80 

AAGAAGCGCG AGCGAGAGGG AGGTCGGGTC CCCCCAGCCC CACCAGGCTG CCCCAAGGAG 840 
GCAGGTGGAG ATGCCTCAGA CCAGTCGACA TGCACCTCCC CGGAAGCCTC ACCCAGCTCT 900 
GTCACCTCCT GAACTGAACC TAGCCACCAA TGGGGCTTCC AGGCACTGGA GCGCCCCAGT 960 

CCAGCCCTAT CCCAGGCTCT CCCAACCCAG GCCTGGCTTC ACTGCCTGGG ATCTCTAGGC 102 0 

^ 1021 
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INFORMATION FOR SEQ ID NO : 6 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 301 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 6 : 

Met Asp Tyr Asn Arg Met Asn Ser Phe Leu Glu Tyr Pro Leu Cys Asn 
15 10 15 

Arg Gly Pro Ser Ala Tyr Ser Ala His Ser Ala Pro Thr Ser Phe Pro 



Pro Ser Ser Ala Gin Ala Val Asp Ser Tyr Ala Ser Glu Gly Arg Tyr 
35 40 45 

Gly Gly Gly Leu Ser Ser Pro Ala Phe Gin Gin Asn Ser Gly Tyr Pro 
50 55 60 

Ala Gin Gin Pro Pro Ser Thr Leu Gly Val Pro Phe Pro Ser Ser Ala 



Pro Ser Gly Tyr Ala Pro Ala Ala Cys Ser Pro Ser Tyr Gly Pro Ser 
85 90 95 

Gin Tyr Tyr Pro Leu Gly Gin Ser Glu Gly Asp Gly Gly Tyr Phe His 
100 105 110 

Pro Ser Ser Tyr Gly Ala Gin Leu Gly Gly Leu Ser Asp Gly Tyr Gly 
115 120 125 

Ala Gly Gly Ala Gly Pro Gly Pro Tyr Pro Pro Gin His Pro Pro Tyr 
130 135 140 

Gly Asn Glu Gin Thr Ala Ser Phe Ala Pro Ala Tyr Ala Asp Leu Leu 
145 150 155 160 

Ser Glu Asp Lys Glu Thr Pro Cys Pro Ser Glu Pro Asn Thr Pro Thr 
165 170 175 

Ala Arg Thr Phe Asp Trp Met Lys Val Lys Arg Asn Pro Pro Lys Thr 
180 185 190 

Ala Lys Val Ser Glu Pro Gly Leu Gly Ser Pro Ser Gly Leu Arg Thr 
195 200 205 

Asn Phe Thr Thr Arg Gin Leu Thr Glu Leu Glu Lys Glu Phe His Phe 
210 215 220 

Asn Lys Tyr Leu Ser Arg Ala Arg Arg Val Glu lie Ala Ala Thr Leu 
225 230 235 240 

Glu Leu Asn Glu Thr Gin Val Lys lie Trp Phe Gin Asn Arg Arg Met 
245 250 255 
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Lys Gin Lys Lys Arg Glu Arg Glu Gly Gly Arg Val Pro Pro Ala Pro 
260 265 270 

Pro Gly Cys Pro Lys Glu Ala Ala Gly Asp Ala Ser Asp Gin Ser Thr 
275 280 285 

Cys Thr Ser Pro Glu Ala Ser Pro Ser Ser Val Thr Ser 
290 295 300 

(2) INFORMATION FOR SEQ ID NO: 7: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1030 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(11) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 

TGACGCATGG ACTATAATAG GATGAACTCC TTCTTAGAGT ACCCACTCTG TAACCGGGGA 60 

CCCAGCGCCT ACAGCGCCCA CAGCGCCCAC AGCGCCCCAA CCTCCTTTCC CCCAAGCTCG 120 

GCTCAGGCGG TTGACAGCTA TGCAAGCGAG GGCCGCTACG GTGGGGGGCT GTCCAGCCCT 180 

GCGTTTCAGC AGAACTCCGG CTATCCCGCC CAGCAGCCGC CTTCGACCCT GGGGGTGCCC 24 0 

TTCCCCAGCT CCGCGCCCTC GGGGTATGCT CCTGCCGCCT GCAGCCCCAG CTACGGGCCT 300 

TCTCAGTACT ACCCTCTGGG TCAATCAGAA GGAGACGGAG GCTATTTTCA TCCCTCGAGC 3 60 

TACGGGGCCC AGCTAGGGGG CTTGTCCGAT GGCTACGGAG CAGGTGGAGC CGGTCCGGGG 420 

CCATATCCTC CGCAGCATCC CCCTTATGGG AACGAGCAGA CCGCGAGCTT TGCACCGGCC 4 80 

TATGCTGATC TCCTCTCCGA GGACAAGGAA ACACCCTGCC CTTCAGAACC TAACACCCCC 54 0 

ACGGCCCGGA CCTTCGACTG GATGAAGGTT AAGAGAAACC CACCCAAGAC AGCGAAGGTG 600 

TCAGAGCCAG GCCTGGGCTC GCCCAGTGGC CTCCGCACCA ACTTCACCAC AAGGCAGCTG 660 

ACAGAACTGG AAAAGGAGTT CCATTTCAAC AAGTACCTGA GCCGGGCCCG GAGGGTGGAG 720 

ATTGCCGCCA CCCTGGAGCT CAATGAAACA CAGGTCAAGA TTTGGTTCCA GAACCGACGA 7 80 

ATGAAGCAGA AGAAGCGCGA GCGAGAGGGA GGTCGGGTCC CCCCAGCCCC ACCAGGCTGC 8 40 

CCCAAGGAGG CAGCTGGAGA TGCCTCAGAC CAGTCGACAT GCACCTCCCC GGAAGCCTCA 900 

CCCAGCTCTG TCACCTCCTG AACTGAACCT AGCCACCAAT GGGGCTTCCA GGCACTGGAG 9 60 

CGCCCCAGTC CAGCCCTATC CCAGGCTCTC CCAACCCAGG CCTGGCTTCA CTGCCTGGGA 1020 

TCTCTAGGCT 1030 
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(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 304 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 8 : 

Met Asp Tyr Asn Arg Met Asn Ser Phe Leu Glu Tyr Pro Leu Cys Asn 
15 10 15 

Arg Gly Pro Ser Ala Tyr Ser Ala His Ser Ala His Ser Ala Pro Thr 
20 25 30 

Ser Phe Pro Pro Ser Ser Ala Gin Ala Val Asp Ser Tyr Ala Ser Glu 
35 40 45 

Gly Arg Tyr Gly Gly Gly Leu Ser Ser Pro Ala Phe Gin Gin Asn Ser 
50 55 60 

Gly Tyr Pro Ala Gin Gin Pro Pro Ser Thr Leu Gly Val Pro Phe Pro 



Ser Ser Ala Pro Ser Gly Tyr Ala Pro Ala Ala Cys Ser Pro Ser Tyr 
85 90 95 

Gly Pro Ser Gin Tyr Tyr Pro Leu Gly Gin Ser Glu Gly Asp Gly Gly 
100 105 110 

Tyr Phe His Pro Ser Ser Tyr Gly Ala Gin Leu Gly Gly Leu Ser Asp 
115 120 125 

Gly Tyr Gly Ala Gly Gly Ala Gly Pro Gly Pro Tyr Pro Pro Gin His 
130 135 140 

Pro Pro Tyr Gly Asn Glu Gin Thr Ala Ser Phe Ala Pro Ala Tyr Ala 
145 150 155 160 

Asp Leu Leu Ser Glu Asp Lys Glu Thr Pro Cys Pro Ser Glu Pro Asn 
165 170 175 

Thr Pro Thr Ala Arg Thr Phe Asp Trp Met Lys Val Lys Arg Asn Pro 
180 185 190 

Pro Lys Thr Ala Lys Val Ser Glu Pro Gly Leu Gly Ser Pro Ser Gly 
195 200 205 

Leu Arg Thr Asn Phe Thr Thr Arg Gin Leu Thr Glu Leu Glu Lys Glu 
210 215 220 

Phe His Phe Asn Lys Tyr Leu Ser Arg Ala Arg Arg Val Glu lie Ala 
225 230 235 240 
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Ala Thr Leu Glu Leu Asn Glu Thr Gin Val Lys lie Trp Phe Gin Asn 
245 250 255 

Arg Arg Met Lys Gin Lys Lys Arg Glu Arg Glu Gly Gly Arg Val Pro 
260 265 270 

Pro Ala Pro Pro Gly Cys Pro Lys Glu Ala Ala Gly Asp Ala Ser Asp 
275 280 285 

Gin Ser Thr Cys Thr Ser Pro Glu Ala Ser Pro Ser Ser Val Thr Ser 
290 295 300 



(2) INFORMATION FOR SEQ ID NO : 9 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(A) DESCRIPTION: /desc = "primer" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 9 : 

GCATGGACTA TAATAGGATG 



(2) INFORMATION FOR SEQ ID NO: 10: 



(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(A) DESCRIPTION: /desc = "primer" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 



TCTTGGGTGG GTTTCTCTTA 



20 



